---
title: 'Programmatic Evaluations'
description: 'Evaluate your model responses for Hallucination, Bias, and Toxicity'
---

## Overview

The OpenLIT SDK provides tools to evaluate AI-generated text, ensuring it aligns with desired standards and identifying issues like hallucinations, bias, and toxicity. We offer four main evaluation modules:

<CardGroup cols={2}>
  <Card title="All Evaluations" href="#all-evaluations">
    Detects and evaluates all text risks including bias, toxicity, and hallucination.
  </Card>
  <Card title="Hallucination" href="#hallucination-detection">
    Detects factual inaccuracies and contradictions in text.
  </Card>
  <Card title="Toxicity" href="#toxicity-detection">
    Detects and flags harmful or offensive language.
  </Card>
  <Card title="Bias" href="#bias-detection">
    Detects and addresses prejudiced or biased language.
  </Card>
</CardGroup>

## Evaluations

### Hallucination Detection

Evaluates text for inaccuracies compared to the given context or factual information, identifying instances where the generated content diverges from the truth.

#### Usage

Use LLM-based detection with providers like OpenAI or Anthropic:

```python
import openlit

# Optionally, set your API key as an environment variable
import os
os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>"

# Initialize the hallucination detector
hallucination_detector = openlit.evals.Hallucination(provider="openai")

# Measure hallucination in text
result = hallucination_detector.measure(
    prompt="Discuss Einstein's achievements",
    contexts=["Einstein discovered the photoelectric effect."],
    text="Einstein won the Nobel Prize in 1969 for the theory of relativity."
)
```

#### Supported Providers and LLMs
<AccordionGroup>
  <Accordion title="OpenAI">
    GPT-4o, GPT-4o mini
  </Accordion>

  <Accordion title="Anthropic">
    Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 3 Opus
  </Accordion>
</AccordionGroup>

#### Parameters
<AccordionGroup>
<Accordion title="`openlit.evals.Hallucination()` Class Parameters">

These parameters are used to set up the `Hallucination` class:

| Name                | Description                                                                                       | Default Value | Example Value           |
|---------------------|---------------------------------------------------------------------------------------------------|---------------|-------------------------|
| `provider`          | The name of the LLM provider, either `"openai"` or `"anthropic"`.                                 | `"openai"`    | `"openai"`              |
| `api_key`           | API key for LLM authentication, set via `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` environment variables. | `None`        | `os.getenv("OPENAI_API_KEY")` |
| `model`             | Specific model to use with the LLM provider (optional).                                           | `None`        | `"gpt-4o"`              |
| `base_url`          | Base URL for the LLM API (optional).                                                              | `None`        | `"https://api.openai.com/v1"` |
| `custom_categories` | Additional categories for detection (optional).                                                   | `None`        | `{"custom_category": "Custom description"}` |
| `threshold_score`   | Score above which a verdict is "yes" (indicating hallucination).                                  | `0.5`         | `0.7`                   |
| `collect_metrics`   | Enable metrics collection.                                                                        | `False`       | `True`                  |

</Accordion>

<Accordion title="`measure` Method Parameters">

These parameters are passed when you call the `measure` method to analyze a specific text:

| Name      | Description                                                  | Example Value                                              |
|-----------|--------------------------------------------------------------|------------------------------------------------------------|
| `prompt`  | The prompt provided by the user.                             | `"Discuss Einstein's achievements"`                        |
| `contexts`| A list of context sentences relevant to the task.            | `["Einstein discovered the photoelectric effect."]`        |
| `text`    | The text to analyze for hallucination.                       | `"Einstein won the Nobel Prize in 1969 for the theory of relativity."` |

</Accordion>
</AccordionGroup>

#### Classification Categories

<Accordion title="Categories">

| Category               | Definition                                                                                                     |
|------------------------|----------------------------------------------------------------------------------------------------------------|
| `factual_inaccuracy`   | Incorrect facts, e.g., Context: ["Paris is the capital of France."]; Text: "Lyon is the capital."              |
| `nonsensical_response` | Irrelevant info, e.g., Context: ["Discussing music trends."]; Text: "Golf uses clubs on grass."               |
| `gibberish`            | Nonsensical text, e.g., Context: ["Discuss advanced algorithms."]; Text: "asdas asdhasudqoiwjopakcea."       |
| `contradiction`        | Conflicting info, e.g., Context: ["Einstein was born in 1879."]; Text: "Einstein was born in 1875 and 1879." |

</Accordion>

#### How it Works
<Accordion title="Explanation">
1. **Input Gathering**: Collects prompt, contexts, and text to be analyzed.

2. **Prompt Creation**: Constructs a system prompt using the specified context and categories.

3. **LLM Evaluation**: Utilizes the LLM to evaluate the alignment of the text with provided contexts and detect hallucinations.

4. **JSON Output**: Provides results with a score, evaluation ("hallucination"), classification, explanation, and verdict ("yes" or "no").
</Accordion>

#### JSON Output:
<Accordion title="Output">
The JSON object returned includes:

```json
{
  "score": "float",
  "evaluation": "hallucination",
  "classification": "CATEGORY_OF_HALLUCINATION or none",
  "explanation": "Very short one-sentence reason",
  "verdict": "yes or no"
}
```

- **Score**: Represents the level of hallucination detected.
- **Evaluation**: Specifies the type of evaluation conducted ("hallucination").
- **Classification**: Indicates the detected type of hallucination or "none" if none found.
- **Explanation**: Provides a brief reason for the classification.
- **Verdict**: "yes" if detected (score above threshold), "no" otherwise.

</Accordion>


### Bias Detection

Identifies and evaluates instances of bias in text generated by AI models. This module leverages Language Model (LLM) analysis to ensure fair and unbiased outputs.

#### Usage

Use LLM-based detection with providers like OpenAI or Anthropic:

```python
import openlit

# Optionally, set your API key as an environment variable
import os
os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>"

# Initialize the bias detector
bias_detector = openlit.evals.BiasDetector(provider="openai")

# Measure bias in text
result = bias_detector.measure(
    prompt="Discuss workplace equality.",
    contexts=["Everyone should have equal opportunity regardless of background."],
    text="Older employees tend to struggle with new technology."
)
```

#### Supported Providers and LLMs
<AccordionGroup>
  <Accordion title="OpenAI">
    GPT-4o, GPT-4o mini
  </Accordion>

  <Accordion title="Anthropic">
    Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 3 Opus
  </Accordion>
</AccordionGroup>

#### Parameters
<AccordionGroup>
<Accordion title="`openlit.evals.BiasDetector()` Class Parameters">

These parameters are used to set up the `BiasDetector` class:

| Name                | Description                                                                                       | Default Value | Example Value           |
|---------------------|---------------------------------------------------------------------------------------------------|---------------|-------------------------|
| `provider`          | The name of the LLM provider, either `"openai"` or `"anthropic"`.                                 | `"openai"`    | `"openai"`              |
| `api_key`           | API key for LLM authentication, set via `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` environment variables. | `None`        | `os.getenv("OPENAI_API_KEY")` |
| `model`             | Specific model to use with the LLM provider (optional).                                           | `None`        | `"gpt-4o"`              |
| `base_url`          | Base URL for the LLM API (optional).                                                              | `None`        | `"https://api.openai.com/v1"` |
| `custom_categories` | Additional categories for detection (optional).                                                   | `None`        | `{"custom_category": "Custom description"}` |
| `threshold_score`   | Score above which a verdict is "yes" (indicating bias).                                           | `0.5`         | `0.6`                   |
| `collect_metrics`   | Enable metrics collection.                                                                        | `False`       | `True`                  |

</Accordion>

<Accordion title="`measure` Method Parameters">

These parameters are passed when you call the `measure` method to analyze a specific text:

| Name      | Description                                               | Example Value                                         |
|-----------|-----------------------------------------------------------|-------------------------------------------------------|
| `prompt`  | The prompt provided by the user.                          | `"Discuss workplace equality."`                       |
| `contexts`| A list of context sentences relevant to the task.         | `["Everyone should have equal opportunity regardless of background."]` |
| `text`    | The text to analyze for bias.                             | `"Older employees tend to struggle with new technology."` |

</Accordion>
</AccordionGroup>

#### Classification Categories

<Accordion title="Categories">

| Category                   | Definition                                                                                               |
|----------------------------|----------------------------------------------------------------------------------------------------------|
| `sexual_orientation`       | Biases or assumptions about an individual's sexual preferences.                                          |
| `age`                      | Biases related to the age of an individual.                                                              |
| `disability`               | Biases or stereotypes concerning individuals with disabilities.                                          |
| `physical_appearance`      | Biases based on the physical look of an individual.                                                      |
| `religion`                 | Biases or prejudices connected to a person's religious beliefs.                                          |
| `pregnancy_status`         | Biases towards individuals who are pregnant or have children.                                            |
| `marital_status`           | Biases related to whether someone is single, married, divorced, etc.                                     |
| `nationality / location`   | Biases associated with an individual's country or place of origin.                                       |
| `gender`                   | Biases related to an individual's gender.                                                                |
| `ethnicity`                | Assumptions or stereotypes based on racial or ethnic background.                                         |
| `socioeconomic_status`     | Biases regarding an individual's economic and social position.                                           |

</Accordion>

#### How it Works
<Accordion title="Explanation">
1. **Input Gathering**: Collects prompt, contexts, and text to be analyzed.

2. **Prompt Creation**: Constructs a system prompt using the specified context and categories.

3. **LLM Evaluation**: Utilizes the LLM to evaluate the text for any bias present against provided contexts.

4. **JSON Output**: Provides results with a score, evaluation ("bias_detection"), classification, explanation, and verdict ("yes" or "no").
</Accordion>

#### JSON Output:
<Accordion title="Output">
The JSON object returned includes:

```json
{
  "score": "float",
  "evaluation": "bias_detection",
  "classification": "CATEGORY_OF_BIAS or none",
  "explanation": "Very short one-sentence reason",
  "verdict": "yes or no"
}
```

- **Score**: Represents the level of bias detected.
- **Evaluation**: Specifies the type of evaluation conducted ("bias_detection").
- **Classification**: Indicates the detected type of bias or "none" if none found.
- **Explanation**: Provides a brief reason for the classification.
- **Verdict**: "yes" if bias detected (score above threshold), "no" otherwise.

</Accordion>

### Toxicity Detection

Evaluates AI-generated text for harmful or offensive language, ensuring interactions are respectful and appropriate. This module uses Language Model (LLM) analysis to detect and address toxic content.

#### Usage

Use LLM-based detection with providers like OpenAI or Anthropic:

```python
import openlit

# Optionally, set your API key as an environment variable
import os
os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>"

# Initialize the toxicity detector
toxicity_detector = openlit.evals.ToxicityDetector(provider="openai")

# Measure toxicity in text
result = toxicity_detector.measure(
    prompt="Engage in a respectful discussion about global events.",
    contexts=["Conversations should remain civil and informative."],
    text="Your opinion is absurd, and only an idiot would think that."
)
```

#### Supported Providers and LLMs
<AccordionGroup>
  <Accordion title="OpenAI">
    GPT-4o, GPT-4o mini
  </Accordion>

  <Accordion title="Anthropic">
    Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 3 Opus
  </Accordion>
</AccordionGroup>

#### Parameters
<AccordionGroup>
<Accordion title="`openlit.evals.ToxicityDetector()` Class Parameters">

These parameters are used to set up the `ToxicityDetector` class:

| Name                | Description                                                                                       | Default Value | Example Value           |
|---------------------|---------------------------------------------------------------------------------------------------|---------------|-------------------------|
| `provider`          | The name of the LLM provider, either `"openai"` or `"anthropic"`.                                 | `"openai"`    | `"openai"`              |
| `api_key`           | API key for LLM authentication, set via `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` environment variables. | `None`        | `os.getenv("OPENAI_API_KEY")` |
| `model`             | Specific model to use with the LLM provider (optional).                                           | `None`        | `"gpt-4o"`              |
| `base_url`          | Base URL for the LLM API (optional).                                                              | `None`        | `"https://api.openai.com/v1"` |
| `custom_categories` | Additional categories for detection (optional).                                                   | `None`        | `{"custom_category": "Custom description"}` |
| `threshold_score`   | Score above which a verdict is "yes" (indicating toxicity).                                       | `0.5`         | `0.6`                   |
| `collect_metrics`   | Enable metrics collection.                                                                        | `False`       | `True`                  |

</Accordion>

<Accordion title="`measure` Method Parameters">

These parameters are passed when you call the `measure` method to analyze a specific text:

| Name      | Description                                               | Example Value                                      |
|-----------|-----------------------------------------------------------|----------------------------------------------------|
| `prompt`  | The prompt provided by the user.                          | `"Engage in a respectful discussion about global events."` |
| `contexts`| A list of context sentences relevant to the task.         | `["Conversations should remain civil and informative."]`   |
| `text`    | The text to analyze for toxicity.                         | `"Your opinion is absurd, and only an idiot would think that."` |

</Accordion>
</AccordionGroup>

#### Classification Categories

<Accordion title="Categories">

| Category           | Definition                                                                                             |
|--------------------|--------------------------------------------------------------------------------------------------------|
| `threat`           | Language that threatens harm or danger to others, e.g., "You'll regret saying things like that."      |
| `dismissive`       | Dismissive or belittling language, e.g., "Your opinion is worthless."                                 |
| `hate`             | Hateful or intensely negative language, e.g., "Only an idiot would think this."                      |
| `mockery`          | Mocking or sarcastic tone, e.g., "Brilliant observation! Did it take you all day to come up with that?"|
| `personal_attack`  | Language attacking a person's character or abilities, e.g., "You're clueless."                        |

</Accordion>

#### How it Works
<Accordion title="Explanation">
1. **Input Gathering**: Collects prompt, contexts, and text to be analyzed.

2. **Prompt Creation**: Constructs a system prompt using the specified context and categories.

3. **LLM Evaluation**: Utilizes the LLM to evaluate the text for any toxic language present against provided contexts.

4. **JSON Output**: Provides results with a score, evaluation ("toxicity_detection"), classification, explanation, and verdict ("yes" or "no").
</Accordion>

#### JSON Output:
<Accordion title="Output">
The JSON object returned includes:

```json
{
  "score": "float",
  "evaluation": "toxicity_detection",
  "classification": "CATEGORY_OF_TOXICITY or none",
  "explanation": "Very short one-sentence reason",
  "verdict": "yes or no"
}
```

- **Score**: Represents the level of toxicity detected.
- **Evaluation**: Specifies the type of evaluation conducted ("toxicity_detection").
- **Classification**: Indicates the detected type of toxicity or "none" if none found.
- **Explanation**: Provides a brief reason for the classification.
- **Verdict**: "yes" if toxicity detected (score above threshold), "no" otherwise.

</Accordion>

### All Evaluations

Combines the capabilities of bias, toxicity, and hallucination detection to provide a comprehensive analysis of AI-generated text. This module ensures that interactions are accurate, respectful, and free from bias.

#### Usage

Use LLM-based detection with providers like OpenAI or Anthropic:

```python
import openlit

# Optionally, set your API key as an environment variable
import os
os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>"

# Initialize the all evaluations detector
all_evals_detector = openlit.evals.All(provider="openai")

# Measure issues in text
result = all_evals_detector.measure(
    prompt="Discuss the achievements of scientists.",
    contexts=["Einstein discovered the photoelectric effect, contributing to quantum physics."],
    text="Einstein won the Nobel Prize in 1969 for discovering black holes."
)
```

#### Supported Providers and LLMs
<AccordionGroup>
  <Accordion title="OpenAI">
    GPT-4o, GPT-4o mini
  </Accordion>

  <Accordion title="Anthropic">
    Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 3 Opus
  </Accordion>
</AccordionGroup>

#### Parameters
<AccordionGroup>
<Accordion title="`openlit.evals.All()` Class Parameters">

These parameters are used to set up the `All` class:

| Name                | Description                                                                                       | Default Value | Example Value           |
|---------------------|---------------------------------------------------------------------------------------------------|---------------|-------------------------|
| `provider`          | The name of the LLM provider, either `"openai"` or `"anthropic"`.                                 | `"openai"`    | `"openai"`              |
| `api_key`           | API key for LLM authentication, set via `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` environment variables. | `None`        | `os.getenv("OPENAI_API_KEY")` |
| `model`             | Specific model to use with the LLM provider (optional).                                           | `None`        | `"gpt-4o"`              |
| `base_url`          | Base URL for the LLM API (optional).                                                              | `None`        | `"https://api.openai.com/v1"` |
| `custom_categories` | Additional categories for detection (optional).                                                   | `None`        | `{"custom_category": "Custom description"}` |
| `threshold_score`   | Score above which a verdict is "yes" (indicating an issue).                                       | `0.5`         | `0.6`                   |
| `collect_metrics`   | Enable metrics collection.                                                                        | `False`       | `True`                  |

</Accordion>

<Accordion title="`measure` Method Parameters">

These parameters are passed when you call the `measure` method to analyze a specific text:

| Name      | Description                                                | Example Value                                         |
|-----------|------------------------------------------------------------|-------------------------------------------------------|
| `prompt`  | The prompt provided by the user.                           | `"Discuss the achievements of scientists."`           |
| `contexts`| A list of context sentences relevant to the task.          | `["Einstein discovered the photoelectric effect."]`   |
| `text`    | The text to analyze for bias, toxicity, or hallucination.  | `"Einstein won the Nobel Prize in 1969 for discovering black holes."` |

</Accordion>
</AccordionGroup>

#### Classification Categories

<Accordion title="Bias Categories">

| Category                   | Definition                                                                                               |
|----------------------------|----------------------------------------------------------------------------------------------------------|
| `sexual_orientation`       | Involves biases or assumptions about an individual's sexual preferences.                                 |
| `age`                      | Biases related to the age of an individual.                                                              |
| `disability`               | Biases or stereotypes concerning individuals with disabilities.                                          |
| `physical_appearance`      | Biases based on the physical look of an individual.                                                      |
| `religion`                 | Biases or prejudices connected to a person's religious beliefs.                                          |
| `pregnancy_status`         | Biases towards individuals who are pregnant or have children.                                            |
| `marital_status`           | Biases related to whether someone is single, married, divorced, etc.                                     |
| `nationality / location`   | Biases associated with an individual's country or place of origin.                                       |
| `gender`                   | Biases related to an individual's gender.                                                                |
| `ethnicity`                | Assumptions or stereotypes based on racial or ethnic background.                                         |
| `socioeconomic_status`     | Biases regarding an individual's economic and social position.                                           |

</Accordion>

<Accordion title="Toxicity Categories">

| Category           | Definition                                                                                             |
|--------------------|--------------------------------------------------------------------------------------------------------|
| `threat`           | Language that threatens harm or danger to others.                                                     |
| `dismissive`       | Dismissive or belittling language.                                                                     |
| `hate`             | Hateful or intensely negative language.                                                                |
| `mockery`          | Mocking or sarcastic tone.                                                                             |
| `personal_attack`  | Language attacking a person's character or abilities.                                                  |

</Accordion>

<Accordion title="Hallucination Categories">

| Category               | Definition                                                                                                     |
|------------------------|----------------------------------------------------------------------------------------------------------------|
| `factual_inaccuracy`   | Incorrect facts.                                                                                               |
| `nonsensical_response` | Irrelevant info.                                                                                              |
| `gibberish`            | Nonsensical text.                                                                                             |
| `contradiction`        | Conflicting info.                                                                                             |

</Accordion>

#### How it Works
<Accordion title="Explanation">
1. **Input Gathering**: Collects prompt, contexts, and text to be analyzed.

2. **Prompt Creation**: Constructs a system prompt encompassing bias, toxicity, and hallucination categories.

3. **LLM Evaluation**: Utilizes the LLM to evaluate for any presence of bias, toxicity, or hallucination against provided contexts.

4. **JSON Output**: Provides results with a score, evaluation (highest risk type), classification, explanation, and verdict ("yes" or "no").
</Accordion>

#### JSON Output:
<Accordion title="Output">
The JSON object returned includes:

```json
{
  "score": "float",
  "evaluation": "evaluation_type",
  "classification": "CATEGORY or none",
  "explanation": "Very short one-sentence reason",
  "verdict": "yes or no"
}
```

- **Score**: Represents the level of detected issue.
- **Evaluation**: Indicates the type of evaluation conducted (bias, toxicity, or hallucination).
- **Classification**: Indicates the detected type or "none" if none found.
- **Explanation**: Provides a brief reason for the classification.
- **Verdict**: "yes" if detected (score above threshold), "no" otherwise.

</Accordion>